Conversation
- Fix wall-clock claim: 41% -> 22% to match benchmark table - Fix dual-model speedup rounding: 1.7x -> 1.6x (10.0/6.1 = 1.64) - Fix run_config API: use dd.set_run_config() instead of passing to create()
Build MkDocs site on PRs that touch docs and deploy to Cloudflare Pages. Each PR gets a browseable preview URL posted as a comment. Notebook tutorials use placeholder stubs since they require API keys to execute. Requires CLOUDFLARE_API_TOKEN and CLOUDFLARE_ACCOUNT_ID repo secrets.
Greptile SummaryThis PR adds the "Async All the Way Down" dev note documenting the async task-queue scheduler and its performance impact, along with companion assets, an author entry, and an updated nav entry in
|
| Filename | Overview |
|---|---|
| docs/devnotes/posts/async-all-the-way-down.md | New dev note covering the async engine; technical claims verified against the codebase — API usage, env vars, and benchmark numbers are all accurate. |
| docs/devnotes/posts/owning-the-model-stack.md | Cross-reference link to the new async-all-the-way-down.md post added; no other content changes. |
| docs/devnotes/.authors.yml | New amanoel author entry added, matching the author slug used in the new post's front matter. |
| mkdocs.yml | New "Async All the Way Down" nav entry inserted at the top of the Dev Notes section (most-recent-first ordering). |
| .github/workflows/docs-preview.yml | Docs preview CI workflow; no changes to build logic visible in this PR diff. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Cell enters Frontier\nwhen upstream deps satisfied] --> B[AsyncTaskScheduler\nacquires submission semaphore slot]
B --> C{LLM-bound\ntask?}
C -- Yes --> D[Acquire LLM-wait semaphore\nRelease submission semaphore]
C -- No --> E[Hold submission slot\nfor full duration]
D --> F[Generator makes\nLLM request via ThrottledModelClient]
E --> G[Generator runs\nCPU/non-LLM work]
F --> H{Provider\nresponse?}
G --> I[Release submission slot\nMark cell complete]
H -- 429 --> J[AIMD: cut concurrency\nDefer task to frontier]
H -- Success --> K[Release LLM-wait slot\nMark cell complete]
J --> A
K --> L[CompletionTracker: unlock\ndownstream cells]
I --> L
L --> A
K --> M[Row group complete?\nFlush to Parquet]
Reviews (14): Last reviewed commit: "docs: address review feedback on async b..." | Re-trigger Greptile
|
Docs preview: https://a54387d5.dd-docs-preview.pages.dev
|
7055573 to
e434aad
Compare
Add DAG subtitle to sync-vs-async timeline figure and bridge the surrounding text to explain which workload shape is being shown.
Regenerate scale-model-timeline and scale-boxplot from nginx access logs (column_progress.csv, sync/summary.json) instead of buffered execution logs. Optimize both PNGs to palette mode. Adjust figure widths and update model timeline commentary.
|
|
||
| # **Async All the Way Down** | ||
|
|
||
| Every Data Designer pipeline carries a map of what can run in parallel. Consider a pipeline that generates a `topic`, writes a `summary` and a `trivia` fact from that topic, then produces an `analysis` of the summary. `summary` and `trivia` both depend on `topic`, so they could run alongside each other. `analysis` depends on `summary`, so it has to wait — but only on the same row's summary, not the entire column. These references form a per-cell dependency graph. The previous engine used that graph to order columns, but it ran each column to completion before starting the next. A row's `analysis` couldn't start until *every* row of `summary` had finished, even though it only needed its own. |
There was a problem hiding this comment.
| Every Data Designer pipeline carries a map of what can run in parallel. Consider a pipeline that generates a `topic`, writes a `summary` and a `trivia` fact from that topic, then produces an `analysis` of the summary. `summary` and `trivia` both depend on `topic`, so they could run alongside each other. `analysis` depends on `summary`, so it has to wait — but only on the same row's summary, not the entire column. These references form a per-cell dependency graph. The previous engine used that graph to order columns, but it ran each column to completion before starting the next. A row's `analysis` couldn't start until *every* row of `summary` had finished, even though it only needed its own. | |
| Every Data Designer pipeline carries a map of what can run in parallel. Consider a pipeline that generates a `topic`, writes a `summary` and a `trivia` fact from that topic, then produces an `analysis` of the summary. `summary` and `trivia` both depend on `topic`, so they could run alongside each other. `analysis` depends on `summary`, so it has to wait — but only on the same row's summary, not the entire column. These references form a per-cell dependency graph. Data Designer’s original workflow engine used that graph to order columns, but it ran each column to completion before starting the next. A row's `analysis` couldn't start until *every* row of `summary` had finished, even though it only needed its own. |
There was a problem hiding this comment.
Thanks, adopted your wording with a small tweak - added "within each batch" to clarify the sync engine already split into batches, it just ran columns sequentially within each one.
|
|
||
| The scheduler maintains a *frontier* — the set of tasks whose inputs are all satisfied. Dispatch is a loop: pull ready tasks from the frontier, acquire a [semaphore](https://en.wikipedia.org/wiki/Semaphore_(programming)) slot, spawn a worker. When the worker completes, mark the cell done, which may add new tasks to the frontier. The loop runs until every cell in every row group has completed or been dropped. | ||
|
|
||
| Two details matter here. Multi-column generators (where one generator produces several output columns) are deduplicated so they run once. And stateful generators like seed dataset readers get per-instance `asyncio.Lock`s to preserve row-group ordering, since the order rows are read from a seed dataset matters. |
There was a problem hiding this comment.
nit: i feel like this bit about multi-column generators and seed readers might be TMI. I get that we want to give technical details here, but the goal is for users to not need to worry about these deep implementation details. It's also a bit confusing because the reader would need to understand how generators relate to columns and why we have multi-column generators to begin with.
There was a problem hiding this comment.
Good call, removed the paragraph. The two-semaphore discussion below is the interesting detail worth keeping.
- Tighten intro to a concise abstract, move pipeline narrative into "The Bottleneck Was Structural" section - Remove multi-column generators / seed readers paragraph (TMI) - Clarify sync engine ran columns sequentially within each batch
📋 Summary
Add "Async All the Way Down" dev note covering the async task-queue scheduler and its impact on Data Designer pipeline performance. Covers the full async engine arc (PRs #356, #378, #404, #429, #456) in a single narrative post with benchmark results and original diagrams.
🔄 Changes
✨ Added
docs/devnotes/posts/async-engine.md- dev note post (~1600 words, slop-guard 93/100)docs/devnotes/posts/assets/async-engine/- 6 figures (NVIDIA-styled, dark background + green accent):🔧 Changed
docs/devnotes/.authors.yml- addedamanoelauthor entrymkdocs.yml- added nav entry (most-recent-first position)🔍 Attention Areas
async-engine.md- technical claims were cross-checked against implementation code (Kahn's algorithm, AIMD, symmetric bridging, semaphores, etc.) and benchmark scripts (DAG shapes, column dependencies). The "At higher record counts" section discusses rate-limiting tradeoffs qualitatively.tmp_blog_content/(not committed) for reference.🤖 Generated with AI